Retrospective in silico mutation profiling of SARS-CoV-2 structural proteins circulating in Uganda by July 2021: Towards refinement of COVID-19 disease vaccines, diagnostics, and therapeutics

The SARS-CoV-2 virus, the agent of COVID-19, caused unprecedented loss of lives and economic decline worldwide. Although the introduction of public health measures, vaccines, diagnostics, and therapeutics disrupted the spread of the SARS-CoV-2, the emergence of variants poses substantial threat. This study traced SARS-CoV-2 variants circulating in Uganda by July 2021 to inform the necessity for refinement of the intervention medical products. A comprehensive in silico analysis of the SARS-CoV-2 genomes detected in clinical samples collected from COVID-19 patients in Uganda revealed occurrence of structural protein variants with potential of escaping detection, resisting antibody therapy, or increased infectivity. The genome sequence dataset was retrieved from the GISAID database and the open reading frame encoding the spike, envelope, membrane, or nucleocapsid proteins was translated. The obtained protein sequences were aligned and inspected for existence of variants. The variant positions on each of the four alignment sets were mapped on predicted epitopes as well as the 3D structures. Additionally, sequences within each of the sets were clustered by family. A phylogenetic tree was constructed to assess relationship between the encountered spike protein sequences and Wuhan-Hu-1 wild-type, or the Alpha, Beta, Delta and Gamma variants of concern. Strikingly, the frequency of each of the spike protein point mutations F157L/Del, D614G and P681H/R was over 50%. The furin and the transmembrane serine protease 2 cleavage sites were unaffected by mutation. Whereas the Delta dominated the spike sequences (16.5%, 91/550), Gamma was not detected. The envelope protein was the most conserved with 96.3% (525/545) sequences being wild-type followed by membrane at 68.4% (397/580). Although the nucleocapsid protein sequences varied, the variant residue positions were less concentrated at the RNA binding domains. The dominant nucleocapsid sequence variant was S202N (34.5%, 205/595). These findings offer baseline information required for refining the existing COVID-19 vaccines, diagnostics, and therapeutics.


Introduction
Severe acute respiratory syndrome coronavirus strain 2 (SARS-CoV-2) caused the outbreak of the coronavirus disease 2019 (COVID-19) pandemic [1,2], which has affected millions of lives around the world, and continues to cause deaths nearly three years after the emergence of the disease. The World Health Organization (WHO) estimated confirmed cumulative cases and deaths, as of the 2 nd September 2022, at 601,189,435 and 6,475,3346, respectively; with new cases per day at 618,970 [3]. Global excess deaths associated with COVID-19 for the period January 2020 -December 2021 were estimated at 14.91 million [4]. According to the World Bank [5], COVID-19 has affected 1.6 billion workers so far, especially in the wholesale and retail businesses, food and hospitality, tourism, transport and manufacturing industries. Key interventions, which reduced COVID-19 hospitalization and deaths were public health measures [6], COVID-19 tests [7], vaccines [8] and therapeutics [9]. Sustainability of this achievement will be guaranteed by conducting constant surveillance as well as evaluating performance of the existing interventions.
The morphology of SARS-CoV-2 is that of a spherical virus decorated with spike, envelope and membrane structural proteins traversing the virus envelope. Encased within the virus core is a 26 to 32 kb linear positive-sense, single-stranded RNA genome tightly bound by nucleocapsid protein. The genome of SARS-CoV-2 has 5'-cap structure, 17 open reading frames (ORFs) (1a, 1b, S, 3a, 3c, 3d, 3b, E, M, 6, 7a, 7b, 8, N, 9b, 9c and 14) and a 3' poly adenine tail [13]. The 5'-cap structure protects the genome from degradation by the host cytoplasmic endonucleases. ORF 1ab is translated into a polyprotein, which is proteolytically cleaved into 16 non-structural proteins (replicase complex). Subsequent ORFs are individually transcribed into sub-genomic RNAs prior to translation [14]. Aware of the diverse roles of SARS-CoV-2 proteins reviewed by Yadav et al. [13], we only highlighted the roles of structural proteins are herein. The spike protein attaches the virus particle onto a cell surface receptor angiotensin-converting enzyme 2 (ACE2) [15] and its cleavage by furin as well as transmembrane serine protease 2 (TMPRSS2) are essential for proteolytic activation of SARS-CoV-2 permitting host cell entry [16]. The envelope protein forms a transmembrane ion channels [17], membrane protein is required for virus assembly [18], and nucleocapsid protein protects the virus genome [17]. Although membrane protein is the most abundant virus structural protein [13], Poran et al. [19] showed that nucleocapsid protein is the most abundant protein inside SARS-CoV-2 infected host cells.
Evidence shows that emerging SARS-CoV-2 variants [22] undermine vaccines [23,24], escape detection by diagnostic tests [25][26][27], or resist antibody therapies [24,[28][29][30]. Variants are classified into lineages/families based on their observed similarity in amino acid substitution/deletion at the same mutation site(s) [31,32]. Furthermore, classification is based on specific attributes such as resulting public health action, changes to receptor binding, reduced neutralization by antibodies generated against previous infection or vaccination, reduced treatment efficacy, potential diagnostic impact or predicted increase in transmissibility or disease severity. These include variants of concern (VOC), Variants of High Consequence (VOHC), Variants of Interest (VOI) and Variants Being Monitored (VBM) [22]. Note that first-generation COVID-19 medical products losing performance against emerging variants were developed based on information acquired from Wuhan-Hu-1 reference strain (NC_045512.2) genome. Therefore, surveillance of SARS-CoV-2 mutation profiles would unequivocally inform the refinement of the existing COVID-19 medical products to effectively tackle challenges imposed by these emerging variants. As such, the WHO advocates for "continual assessment of genomic diversity, including in antigenically important sites that may be under selection, to help identify plausible candidate sites that might affect the efficacy of serological assays" [33]. To this effect, a comprehensive retrospective analysis of the heterogeneity of SARS-CoV-2 structural proteins derived from genome sequences originating from Uganda was conducted.

Dataset retrieval
SARS-CoV-2 genome sequences (n = 600) were retrieved from the Global Initiative on Sharing All Influenza Data (GISAID) database [34].  28), and represented the totality of the then available sequences submitted to GISAID from Uganda. These sequences were derived from samples collected from the 21 st March 2020, when the first case of COVID-19 was detected in Uganda, to 24 th June 2021. This dataset was accessed from the 23 rd June to 9 th August 2021. Data processing and analyses were sequentially performed as outlined below (Fig 1).

Translation of open reading frames
SARS-CoV-2 genome sequences were translated into structural proteins using getorf software (EMBOSS, version 6.4.0.0) [35], and further validated with both Geneious Prime 1 software (Biomatters, version 2022.2.2) [36], and the National Center for Biotechnology Information (NCBI) ORF finder [37] with ORF set to a minimum length of 150 nucleotides. The tri-nucleotides ATG and UAG delineated the ORF start and stop codons, respectively. Those ORFs with amplicon drop-outs were removed from the analysis. Translated sequences were imported to BioEdit Sequence Alignment Editor software (Tom Hall, version 7.2.5) [38] where the sequences were inspected for defects including truncations.

Multiple sequence alignment and analysis of variant positions
The Geneious Alignment package was used at the default setting (gap opening penalty = 12, gap extension penalty = 3, and refinement iterations = 2) during global alignment of translated sequences. Each of the alignment sets was exported to BioEdit where the extent of the variability of amino acid residue (entropy) at every position was calculated. The entropy values generated, per alignment set, by the BioEdit were plotted using Microsoft Excel version 2019. https://doi.org/10.1371/journal.pone.0279428.g001 Thereafter, frequencies of residues occupying each of the variant positions, per alignment set were plotted using Microsoft Excel.

Mapping mutations within the epitopes and on the 3D structures
To have insight into the possible effects of the mutations encountered on performance of vaccines, immunodiagnostics and antibody-based therapies, mutated positions were mapped onto epitopes predicted in previous studies [39][40][41][42]. Also conducted was mapping of the mutant positions on 3D structures accessed from the PDB [43] and the NCBI [44] databases. RasWin Molecular Graphics software (GNU GPL, version 2.7.5.1) [45] was used for viewing the downloaded structures prior to cleaning water and heteroatoms using the BOVIA Discovery Studio client molecular software modelling (Dassault Systemes 1 , version 2021) [46]. The PDB IDs 7DDD, 6VYO, 6WJI and 7K3G were used for mapping point mutations on spike protein, nucleocapsid N-terminal domain (NTD), nucleocapsid C-terminal domain (CTD), and envelope protein, respectively. Although 3-D structure of membrane protein is not yet fully understood [47]; the structure was predicted from its primary sequence using the Alpha-Fold2 protein 3D structure prediction software (Alphabet-DeepMind, version 2.1) [48] built in UCSF ChimeraX software (University of California, version 1.4) [49]. Afterwards, annotations of point mutations on the PDB structures were performed using UCSF ChimeraX.

Classification of sequence families
The diversities of sequence families existing within each of the aligned protein sets were established. Alignments were imported to BioEdit and the sequences that were 100% identical were clustered into a family. Afterwards, the number of sequence families and the number of sequences per family (family size) were counted.

Phylogenetic analysis of spike protein
The relationship between the Ugandan spike protein sequences and wt, or the Alpha, Beta, Delta and Gamma VOCs was investigated. A representative sequence was selected from each of the Ugandan spike sequence families to generate less crowded tree. To this sequence data, reference sequences i.e., Wuhan-Hu-1 wt (sp|P0DTC2|SPIKE SARS2), Alpha (QWE88920.1), Beta (QRN78347.1), Delta (QWK65230.1) and Gamma (QVE55289.1) sequences were added. The selected sequences were then analyzed using Molecular Evolutionary Genetics Analysis (MEGA) software (Pennsylvania State University, version 11) [50]. Briefly, the sequences were first aligned by Multiple Sequence Comparison by Log-Expectation (MUSCLE) software (drive5, version 3.8.31) using the default parameters. All the identified gaps were removed. The evolutionary history was then inferred by using the Neighbor-Joining and Maximum Likelihood methods with 1000 bootstrap. The evolutionary distances were computed using the Dayhoff matrix-based method [51].

Ethics statement
This study was approved by the MAKCHS School of Biomedical Sciences Research Ethics Committee (Approval number: SBS-2021-38), and the Uganda National Council for Science and Technology (Approval number: HS1706ES). Given that secondary data was used, the need for consent was waived by the MAKCHS Sciences School of Biomedical Sciences Research Ethics Committee.

Mutations on SARS-CoV-2 structural proteins
Six hundred genome sequences, which constituted 0.03% of the total SARS-CoV-2 genome sequences in GISAID nucleotide database (n = 2,264,896) as of 9 th August 2021 were downloaded. Five Hundred and Seventy-Two (95.3%) of the sequences originated from MRC/UVRI & LSHTM, and 28 (4.7%) from the MAKCHS. Full-length ORFs on the genome coding for spike (n = 550), envelope (n = 545), membrane (n = 580), and nucleocapsid protein (n = 594) were translated into respective proteins. On inspection of the translated protein sets, N-terminally truncated membrane (n = 2) and nucleocapsid protein (n = 1) were discovered. The three defective sequences were dropped from the analysis reducing the number of membrane and nucleocapsid protein sequences to 578 and 593, respectively. Following alignment, positions harbouring variant amino acid residues were detected on spike (n = 137), envelope (n = 3), membrane (n = 9), and nucleocapsid protein sequences (n = 68). Entropy (H(x)) plot was generated to display positions harbouring variant residues and the degree of variability is shown as spiking bars (Fig 2). Spike protein sequence variant positions peak heights ranged from 0.01329 to 1.07852. The S1 subunit particularly the NTD, receptor binding domain (RBD) and the neighbouring region proximal to S1/S2 junction had higher density of tall peaks than the S2 subunit (Fig 2A). Specifically, conspicuous peaks were observed at positions  (Fig 2B). The membrane protein had few randomly distributed variant positions with peak heights ranging from 0.01273 to 0.61174. The tallest peak was located at the transmembrane domain 3 (TM3) on position 82 (I82S, entropy value 0.61174). Interestingly, neither TM1-TM2 nor TM2-TM3 junction bore point mutation (Fig 2C). Peak heights on variant positions at the nucleocapsid protein sequence ranged from 0.01245 to 0.89423 (Fig 2D).  Fig 3A). None of the three variant positions on the envelope protein (L21F, S68P and P71L) had less than 90% wt residue occupancy, and position 82 (I82T; 69.9% I and 30.1% T) on membrane protein sequence had 70% wt residue occupancy (Fig 3B and 3C

Mapping variant positions within the epitopes and spatial location on 3D structures
An assessment was conducted to predict the possible effects that these encountered point mutations may have on vaccines, immunoassays, and immunotherapies on the basis of their locations on the epitope sequences as well as 3D structures of each of the four structural proteins. Indeed, most mutated positions occurred within the predicted epitopes. We found that 91/137 mutated positions on spike protein were located within the epitopes, envelope had 3/3, Heights of pink, grey and yellow bars qualify the frequency of amino acid residue variants 1, 2 and 3, respectively. The arrows are on positions where substantial reduction in the frequency of wt-like residues occurred. The red, orange and yellow arrows are on positions where frequency of wt residues is below 50%, at 60±5%, and at 70±5%, respectively. It is worthy noting that at any variant position there is Wuhan type residue and a substitution and/or Del being referred to as wild-type residue and variant residue type, respectively. As an example, at position 69 of spike protein sequences analyzyed, there were 486 sequences bearing Wuhan type residue (Wild-type residue), one sequence had H69Y substitution point mutation (Variant residue type 1) and 63 sequences had H69Del (Variant residue type 2). Numerical assignment of variant residue types 1, 2 and 3 was random. shows wt-like residue occupancy is within 70 ± 5%. At any variant position there is Wuhan type residue and a substitution being referred to as wild-membrane had 6/9, and the nucleocapsid had 46/68. Two spike variants, G446V and L452R, previously shown to resist convalescent sera and monoclonal antibody therapies, respectively [52] are surprisingly located within the predicted T-cell epitopes [40]. This information is summarized ( Table 1).
Next, we mapped the variant sites on the 3D structures (Fig 4). PDB ID: 7DDD used for locating variant sites on the 3D of spike protein had 110/137 (80%) of variant positions. Whereas 55 of these variant sites were solvent exposed, 55 were located inside the protein core. The exposed variant sites were dispersed all-over the spike protein but the density was higher at the periphery of the dorsal surface (Fig 4A). Envelope protein had three variant sites; however, we were able to locate L21F only on the PDB ID: 7K3G luminal surface (Fig 4B). Due to the lack of available experimental structures, the structure of the membrane protein was predicted using AlphaFold. The structural model was perceived to be reliable based on the relatively high AlphaFold model confidence (pLDDT) scores (S1 Fig). All the nine variant positions could be located on the predicted structure. The S2F is located on the extravirion NTD; A69S, V70F, I82T and A85S are located on the transmembrane helix; and positions M109I, H155Y/N and Q185H are located on the intravirion CTD (Fig 4C). For the nucleocapsid protein, 36.8% (25/68) variants sites were located. Fourteen of these 25 variant sites were located on the NTD (PDB ID: 6VYO), and 11 were located on the CTD (PDB: 6WJI) (Fig 4D). While 11 out of the 14 variant sites located on the NTD were surface exposed, 10 of the 11 variant sites on CTD were surface exposed.

PLOS ONE
Mutation profiling of SARS-CoV-2 structural proteins circulating in Uganda The underlined amino acid residues located on epitope sequences have either undergone substitution or deletion. Potential influential positions annotated on the 3D structures are coloured. a Mutation (G446V) shown to resist convalescent serum [52] b Mutation (L452) shown to resist neutralizing monoclonal antibody [52]. https://doi.org/10.1371/journal.pone.0279428.t001

Phylogeny of the spike protein
The relationship between the Uganda SARS-CoV-2 spike protein sequences (n = 550) and Wuhan-Hu-1 wt (P0DTC2) reference strain or VOCs including Alpha (QWE88920.1), Beta (QRN78347.1), Delta (QWK65230.1) and Gamma (QVE55289.1) was assessed in order to establish which of the strain (s) was circulating in Uganda. Uganda sequences clustered with Wuhan-Hu-1 wt, the Alpha, Beta, and Delta VOCs but not Gamma (S2 Fig). Two hundred and fifty-six sequences (from 62 families) clustered with Wuhan-Hu-1 wt, 133 sequences (from 20 families) clustered with Delta VOC, 17 sequences (from 10 families) clustered with Alpha VOC (orange-brown dot), and 13 sequences (from six families) clustered with Beta VOC. Then there were 131 sequences (from 43 families), which neither clustered with Wuhan-Hu-1 wt nor the 3 spike VOCs (Alpha, Beta, and Delta). Of these un-clustered sequences, 14 (from 7 families) strongly clustered with family 79, which comprised of 22 sequences.

Discussion
The genome of SARS-CoV-2 virus has accumulated several mutations [53], which have decreased the performance of diagnostics and therapeutic antibodies. For these reasons, refinement of first-generation COVID 19 medical products in tandem with emerging virus variants

PLOS ONE
is required. Using SARS-CoV-2 structural protein sequences originating from Uganda as a case study, retrospective profiling was conducted to ascertain degree of heterogeneity, which occurred between March 2020 to June 2021. Although mutations affected multiple positions on each of the structural proteins, spike and nucleocapsid proteins were the most affected. The S1 subunit (mostly NTD, RBD and the region upstream the S1/S2 junction) of spike protein was more affected than the S1 in agreement with Jia and Gong [54]. The volatility of residues located on a more exposed S1 subunit allows the virus to thwart antibody neutralization [55] thereby promoting transmission [56]. Unlike S1, the concealed S2 subunit [57] was more conserved. Owing to its crucial roles in stabilization of the spike protein architecture [57] and host cell membrane fusion [13,58], extensive mutation of S2 could be detrimental to the virus. Similarly, high fidelity of both furin and TMPRSS2 cleavage sites explains their crucial role in proteolytic activation of SARS-CoV-2 virus for host cell entry and mutation of any of these two sites is lethal [16]. Hence, the invariant S2 subunit offers opportunity for development of cross-reactive protein-based vaccines, immunoassays and immunotherapies compared to S1. However, inaccessibility of S2 by large therapeutic molecules may present a problem. Low molecular weight therapeutic compounds such as nanobodies [59] and antimicrobial peptides [60] overcome obstructed access to such buried targets on the virus. Unlike spike or nucleocapsid, envelope and membrane proteins were more conserved throughout their entire lengths. Conservation of both envelope and membrane proteins is in order given their concealment from neutralizing antibodies. Moreover, the lengths of these two proteins are relatively short [47] meaning their mutations can result in functional impairment leading to the loss of viral fitness as shown by Verdiá-Báguena et al. [61] that mutations N15A and V25F impair ion conductivity of the envelope protein. The invariant nature of envelope and membrane proteins offer suitable targets where cross-reactivity is required. For nucleocapsid protein, the RNA binding domains were conserved although their flanking regions were variant as it was previously reported [62]. Relative conservation of RNA binding domains is attributable to strict selection of residues for specific RNA interacting residues, which is not the case with residues located in the variant regions. A consequence of mutation on nucleocapsid is antigenic drift, which has led to false-negative test result by nucleocapsid-based commercial tests [27,63]. Assessment of the performance of panels of nucleocapsid-based reagents on recombinant forms of predominant variants documented in this study is therefore highly recommended.
Outstanding variant positions on the spike protein were F157L/Del at the NTD, D614G located distal to the RBD, and P681H/R located proximal to the S1/S2 furin cleavage site. F157L/Del variant characterizes SARS-CoV-2 virus lineage A.23.1 detected in Uganda [64]. It is presumptive to link the dominance of F157L/Del to immune escape given its location on a predicted epitope published elsewhere [42]. On the other hand, co-evolution of F157L/Del with P681H/R variant, which is known for promoting cell membrane fusion [65], could have enhanced infectivity resulting in proliferation of the variant. D614G global dominance was reported earlier [66,67] and it is associated with increased infectivity [66] ascribed to re-configured RBD, which favors ACE2 receptor binding [67]. Given that existing vaccines, diagnostics and neutralizing antibodies panels were raised against Wuhan-Hu-1 wt targets, extensive validation of these products on F157L/Del, D614G and P681H/R variants is, therefore, highly recommended. Although they have not surpassed wt, V367F and Q613H spike variants require follow-up because of their apparent rising levels. Apparent increase in the frequency of V367F spike variant contradicts reports that it is sensitive to neutralizing antibodies [52]. Low herd immunity at the start of the pandemic in combination with co-evolution of V367F with fusion promoting P681H/R variants could explain observed sharp rise. A sharp rise in Q613H variant is speculated to be associated with increased transmissibility following re-configuration of RBD like the D614G, and co-evolution with the P681H/R variant, which promotes cell fusion. Apart from dominant spike variants, underrepresented variants encountered require close monitoring to avert a possible buildup into a next pandemic. For example, Li et al. [52] has shown that low frequency L452R variant located on predicted T-cell epitopes [40] resists antibody neutralization showing the variant harbors capability of proliferation to epidemic levels. Envelope protein variants (L21F, S68P or P71L) were remarkably lower than the wt. P71L variant, which was the most predominant among the envelope protein variants co-evolves with the Beta spike VOC [68], and L21F variant, the second most predominant variant, co-evolves with the Eta spike variant of interest [69]. Thus, association with highly transmissible spike variants explain relative high proliferation of P71L and L21F envelope sequence variants. Membrane protein had nine variant positions and I82T being the most highly represented (Fig 3C). Co-evolution of I82T sequence variant with the highly transmissible Delta spike variant [70] explains its high frequency. Worth noting, frequencies of other membrane protein variants were extremely low irrespective of their topology (Figs 2C and 4C) and location within epitopes (Table 1) meaning that these two attributes may not have much influence on their propagation. The most predominant variant S202N on nucleocapsid protein co-evolves with a highly transmissible lineage A.23.1 spike variant [64], and the second most predominant variant R203K/M co-evolves with the Theta, Omicron and Delta spike variants [71]. High proliferation of S202N and R203K/M, both co-evolving with high transmissible spike variants, shows the positive influence of spike protein has on propagation of other SARS-CoV-2 structural proteins. Collectively, it is now apparent that mutation of spike protein to a highly transmissible variant drives amplification of remotely located co-evolving variants.
Protein sequences were grouped by family based on 100% identity of residues at each of the positions. Spike protein formed the most diverse clusters totaling to 141 families followed by nucleocapasid (n = 81), membrane (n = 11) and envelope (n = 4). Of the spike protein sequence families, typical Wuhan-Hu-1 wt family had unexpectedly few members (n = 24) representing 4.36% (24/550) of the entire spike sequences recorded. Wuhan-Hu-1 wt family was present at the beginning of the pandemic and quiesce by October, 2020. Low frequency of Wuhan-Hu-1 wt spike sequence is mostly attributed to transmission interruption due to strict implementation of public health measures at the very beginning of the pandemic. While the wt strain wanes, new spike variants emerged and became dominant causing mild to severe disease. The new variants were able to spread rapidly for two major reasons: (1) laxity in the implementation of public health measures, and (2) resistance to herd immunity induced by natural exposure as well as vaccine. Thus, the latter can be overcome through accelerating vaccine coverage employing next-generation spike variant derived cocktail vaccine. Majority of envelope protein (96.3%) and membrane protein (68.4%) sequences were wt. Next to Wuhan wt, the outstanding membrane protein sequence family was I82T variant. I82T sequence variant co-evolves with the highly transmissible Delta spike variant explaining its high prevalence in the population. Like spike, typical Wuhan-Hu-1 nucleocapsid protein sequence was poorly represented accounting for 5.9% (35/593) of the entire sequences. This typical Wuhan-Hu-1 nucleocapsid protein sequence could not be detected by September 2020 coinciding with the disappearance of Wuhan-Hu-1 wt spike protein sequence signifying that SARS-CoV-2 viruses possessing parent spike as well as nucleocapsid proteins may have loss fitness in the course of the pandemic. Nucleocapsid sequence variant S202N has the most predominant sequence family with 205 members (34.5%). As it was noted earlier, S202N sequence variant is highly amplified courtesy of co-evolution with a highly transmissible lineage A.23.1 spike variants. Collectively, observed rapid evolution particularly of the spike and nucleocapsid sequences calls for rapid refinement of Wuhan-Hu-1 wt based vaccines, diagnostics and immunotherapy to incorporate predominant and fixated sequence families to catch up with the pace of virus evolution. Where target conservation is required for cross-reactivity, envelope and membrane protein are suitable candidates. However, the use of vaccines, diagnostics and therapeutics designed based on wt sequence information should not yet be discouraged without gathering concrete proofs through repeated experimental evidence.
The evolutionary relationship between circulating spike protein sequences and Wuhan-Hu-1 wt or the Alpha, Beta, Gamma and Delta VOCs was assessed. The majority of sequences clustered with Wuhan-Hu-1 spike followed by Delta, Alpha, and Beta. Gamma and related sequences were absent from the 550 sequences examined. There were other large groups of sequences that neither clustered with Wuhan-Hu-1 wt nor the VOCs. The observed sequence clustering patterns is not surprising. The majority of the sequences circulating in Uganda were closely related to Wuhan-Hu-1 wt given that sequence dataset where from samples collected from the first wave and immediately before the second wave when mutations were not yet extensive. Besides, at the beginning of the pandemic there was mandatory hospitalization and intensive case surveillance, which allowed collection of many Wuhan-Hu-1 wt related sequences. Also encountered in the dataset were the typical VOCs and related sequences, which emerged later in the course of the pandemic. These VOCs appeared in the trough lying between the crests of the first and second waves [72]. The Alpha and Beta clusters were much lower than Delta. Alpha and Beta VOCs appearance coincided with the time when Uganda was observing strict public health measures, which limited their transmission consequentially diminishing their population. Also, subclinical infections, which did not lead to hospitalization accounted for low recovery of SARS-CoV-2 virus variants causing mild infections. On the other hand, the Delta variants entered Uganda several months after the first lockdown when the biosecurity measures were no longer being maximally observed. This factor led to rapid transmission of SARS-CoV-2 variants, Delta variant inclusive, circulating at the time culminating in a second infection wave. Moreover, the virulent nature of the Delta spike variant led to massive hospitalization maximising the chances of sample collection for sequencing. It can be argued that public health measures instituted at the beginning of the pandemic followed by the implementation of vaccination programme greatly influenced the transmission of SARS-CoV-2 wt and other variants in Uganda. Therefore, the dynamics of SARS-CoV-2 variants described herein defines a Uganda situation, which may sharply vary from other countries.

Conclusion
We showed that SARS-CoV-2 viruses that were circulating in Uganda within the study period had heterogenous structural proteins. Firstly, the findings of this surveillance study will contribute to the body of knowledge required for research and development of COVID-19 nextgeneration medical products targeting emerging SARS-CoV-strains. Secondly, the study provides baseline data for evaluating and measuring evolution of SARS-CoV-2 variants on a time scale. Thirdly, the investigation highlighted the dynamics of SARS-CoV-2 structural protein variants, which would guide policy makers on the choices of vaccines, test platforms and therapeutics befitting SARS-CoV-2 virus strains in circulation. The study was limited by the number of sequences analyzed, which were below the total number of COVID-19 reported cases in Uganda (n = 1,249) as of 9 th August 2021 available at [72]. Firstly, it is recommended that global sequence dataset representing cases which occurred in the country be analyzed. Secondly, experimental data should be generated to foster the evidence-based understanding of the impact of encountered SARS-CoV-2 structural protein mutations on the course and control of COVID-19 disease. Thirdly, data from this study may not mirror and/or reflect the situation in other countries; therefore, it is recommendable that every country performs a comprehensive analysis of SARS-CoV-2 mutation trends.  1) and Gamma (QVE55289.1) variants of concern (VOCs). Ugandan spike protein sequences clustered with Wuhan-Hu-1 wt (green dot) and three VOCs namely Alpha (orange-brown dot), Beta (blue dot), and Delta (red dot) but not Gamma (magenta dot). Numbers on the branches are bootstrap values. Crowding of the tree was avoided by showing only those bootstrap values >50. There were sequences, which neither clustered with Wuhan-Hu-1 reference strain nor the VOCs (n = 131). Some of these "un-clustered" sequences formed a separate cluster around family 79 (yellow dot). Thus, from largest to smallest cluster we had Wuhan cluster (n = 256), Delta VOC cluster (n = 133), Alpha VOC cluster (n = 17), and Beta VOC cluster (n = 13). (PDF) S1